Database of Mandarin Neighborhood Statistics

نویسندگان

Karl Neergaard

Hongzhi Xu

Chu-Ren Huang

چکیده

In the design of controlled experiments with language stimuli, researchers from psycholinguistic, neurolinguistic, and related fields, require language resources that isolate variables known to affect language processing. This article describes a freely available database that provides word level statistics for words and nonwords of Mandarin, Chinese. The featured lexical statistics include subtitle corpus frequency, phonological neighborhood density, neighborhood frequency, and homophone density. The accompanying word descriptors include pinyin, ascii phonetic transcription (sampa), lexical tone, syllable structure, dominant PoS, and syllable, segment and pinyin lengths for each phonological word. It is designed for researchers particularly concerned with language processing of isolated words and made to accommodate multiple existing hypotheses concerning the structure of the Mandarin syllable. The database is divided into multiple files according to the desired search criteria: 1) the syllable segmentation schema used to calculate density measures, and 2) whether the search is for words or nonwords. The database is open to the research community at https://github.com/karlneergaard/Mandarin-Neighborhood-Statistics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effects of Phonological Neighborhoods on Spoken Word Recognition in Mandarin Chinese

Title of Document: THE EFFECTS OF PHONOLOGICAL NEIGHBORHOODS ON SPOKEN WORD RECOGNITION IN MANDARIN CHINESE Pei-Tzu Tsai, Master of Arts, 2007 Directed By: Professor Nan Bernstein Ratner Department of Hearing and Speech Sciences Associate Professor Rochelle Newman Department of Hearing and Speech Sciences Spoken word recognition is influenced by words similar to the target word with one phoneme...

متن کامل

The design and application of a speech database for Chinese TTS system

The design and application of a speech database for Mandarin TTS system is presented in this paper. To build a scientific, versatile speech database to meet the call for improving the quality of synthesis units and enhancing previous prosodic models, is the main point of the research. The database structure and contents and the methodology for creating similar database are described, and also s...

متن کامل

A Review of Statistics and Probability Journals in ISI Database

As in recent years the scientific productivity about ISI database and other related database have been increased, it is eligible for researchers of Statistics in Iran to know more about these journals and their statues in ISI database. In this study with the use of bibliometric methods, we have reviewed the status of Statistics and Probability . From all nations around the world, these are only...

متن کامل

A Prosodic Labeling System for Mandarin Speech Database

A working database needs tools to transcribe and label at both phonetic and prosodic levels. While the proposed phonetic transcription system is a simplified from of the International Phonetic Alphabet (IPA) following the SAMPA guidelines; the prosodic labeling system is an elaborated form of the ToBI (Tone and Break Indices) framework adopted for Mandarin. In particular, the proposed prosodic ...

متن کامل

Multi-accented Mandarin Database Construction and Benchmark Evaluations

In this paper, we describe the designing, recording and checking procedures of a multi-accented Mandarin speech database, and present benchmark evaluation of this database. The database was recorded in 6 cities in China, containing 1200 speakers’ accented Mandarin speech of continuous digits, isolated words and sentences. In total, 520k utterances (572.5 hours) were collected. We perfrom the in...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Database of Mandarin Neighborhood Statistics

نویسندگان

چکیده

منابع مشابه

The Effects of Phonological Neighborhoods on Spoken Word Recognition in Mandarin Chinese

The design and application of a speech database for Chinese TTS system

A Review of Statistics and Probability Journals in ISI Database

A Prosodic Labeling System for Mandarin Speech Database

Multi-accented Mandarin Database Construction and Benchmark Evaluations

عنوان ژورنال:

اشتراک گذاری